{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "WfGpInivS0fG"
   },
   "source": [
    "<h2 align=\"center\">点击下列图标在线运行HanLP</h2>\n",
    "<div align=\"center\">\n",
    "\t<a href=\"https://colab.research.google.com/github/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/ner_restful.ipynb\" target=\"_blank\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
    "\t<a href=\"https://mybinder.org/v2/gh/hankcs/HanLP/doc-zh?filepath=plugins%2Fhanlp_demo%2Fhanlp_demo%2Fzh%2Fner_restful.ipynb\" target=\"_blank\"><img src=\"https://mybinder.org/badge_logo.svg\" alt=\"Open In Binder\"/></a>\n",
    "</div>\n",
    "\n",
    "## 安装"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "IYwV-UkNNzFp"
   },
   "source": [
    "无论是Windows、Linux还是macOS，HanLP的安装只需一句话搞定："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "1Uf_u7ddMhUt"
   },
   "outputs": [],
   "source": [
    "pip install hanlp_restful -U"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "pp-1KqEOOJ4t"
   },
   "source": [
    "## 创建客户端"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "id": "0tmKBu7sNAXX"
   },
   "outputs": [],
   "source": [
    "from hanlp_restful import HanLPClient\n",
    "HanLP = HanLPClient('https://www.hanlp.com/api', auth=None, language='zh') # auth不填则匿名，zh中文，mul多语种"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "EmZDmLn9aGxG"
   },
   "source": [
    "#### 申请秘钥\n",
    "由于服务器算力有限，匿名用户每分钟限2次调用。如果你需要更多调用次数，[建议申请免费公益API秘钥auth](https://bbs.hanlp.com/t/hanlp2-1-restful-api/53)。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "elA_UyssOut_"
   },
   "source": [
    "## 命名实体识别"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "wxctCigrTKu-"
   },
   "source": [
    "同时执行所有标准的命名实体识别："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "Zo08uquCTFSk",
    "outputId": "21be671b-ead0-43c9-cc3a-32c305d8be29"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"tok/fine\": [\n",
      "    [\"2021年\", \"HanLPv2.1\", \"为\", \"生产\", \"环境\", \"带来\", \"次\", \"世代\", \"最\", \"先进\", \"的\", \"多\", \"语种\", \"NLP\", \"技术\", \"。\"],\n",
      "    [\"阿婆主\", \"来到\", \"北京\", \"立方庭\", \"参观\", \"自然\", \"语义\", \"科技\", \"公司\", \"。\"]\n",
      "  ],\n",
      "  \"ner/msra\": [\n",
      "    [[\"2021年\", \"DATE\", 0, 1], [\"HanLPv2.1\", \"ORGANIZATION\", 1, 2]],\n",
      "    [[\"北京立方庭\", \"LOCATION\", 2, 4], [\"自然语义科技公司\", \"ORGANIZATION\", 5, 9]]\n",
      "  ],\n",
      "  \"ner/pku\": [\n",
      "    [],\n",
      "    [[\"北京\", \"ns\", 2, 3], [\"立方庭\", \"ns\", 3, 4], [\"自然语义科技公司\", \"nt\", 5, 9]]\n",
      "  ],\n",
      "  \"ner/ontonotes\": [\n",
      "    [[\"2021年\", \"DATE\", 0, 1], [\"次世代\", \"DATE\", 6, 8]],\n",
      "    [[\"北京\", \"FAC\", 2, 3], [\"立方庭\", \"LOC\", 3, 4], [\"自然语义科技公司\", \"ORG\", 5, 9]]\n",
      "  ]\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "print(HanLP('2021年HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。阿婆主来到北京立方庭参观自然语义科技公司。', tasks='ner*'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "每个四元组表示`[命名实体, 类型标签, 起始下标, 终止下标]`，下标指的是命名实体在单词数组中的下标，单词数组默认为第一个以`tok`开头的数组。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "cqEWnj_7p2Lf"
   },
   "source": [
    "任务越少，速度越快。如指定仅执行命名实体识别，默认MSRA标准："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 572
    },
    "id": "BqEmDMGGOtk3",
    "outputId": "33790ca9-7013-456f-c1cb-e5ddce90a457"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Token    \tNER Type        \n",
      "─────────\t────────────────\n",
      "2021年    \t───►DATE        \n",
      "HanLPv2.1\t───►ORGANIZATION\n",
      "为        \t                \n",
      "生产       \t                \n",
      "环境       \t                \n",
      "带来       \t                \n",
      "次        \t                \n",
      "世代       \t                \n",
      "最        \t                \n",
      "先进       \t                \n",
      "的        \t                \n",
      "多        \t                \n",
      "语种       \t                \n",
      "NLP      \t                \n",
      "技术       \t                \n",
      "。        \t                \n",
      "\n",
      "Tok\tNER Type        \n",
      "───\t────────────────\n",
      "阿婆主\t                \n",
      "来到 \t                \n",
      "北京 \t◄─┐             \n",
      "立方庭\t◄─┴►LOCATION    \n",
      "参观 \t                \n",
      "自然 \t◄─┐             \n",
      "语义 \t  │             \n",
      "科技 \t  ├►ORGANIZATION\n",
      "公司 \t◄─┘             \n",
      "。  \t                \n"
     ]
    }
   ],
   "source": [
    "HanLP('2021年HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。阿婆主来到北京立方庭参观自然语义科技公司。', tasks='ner').pretty_print()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "jj1Jk-2sPHYx"
   },
   "source": [
    "执行OntoNotes命名实体识别："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 572
    },
    "id": "1goEC7znPNkI",
    "outputId": "2a97331c-a5fb-4d3c-ccf2-ce2186616c57"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Token    \tNER Type\n",
      "─────────\t────────\n",
      "2021年    \t───►DATE\n",
      "HanLPv2.1\t        \n",
      "为        \t        \n",
      "生产       \t        \n",
      "环境       \t        \n",
      "带来       \t        \n",
      "次        \t◄─┐     \n",
      "世代       \t◄─┴►DATE\n",
      "最        \t        \n",
      "先进       \t        \n",
      "的        \t        \n",
      "多        \t        \n",
      "语种       \t        \n",
      "NLP      \t        \n",
      "技术       \t        \n",
      "。        \t        \n",
      "\n",
      "Tok\tNER Typ\n",
      "───\t───────\n",
      "阿婆主\t       \n",
      "来到 \t       \n",
      "北京 \t───►FAC\n",
      "立方庭\t───►LOC\n",
      "参观 \t       \n",
      "自然 \t◄─┐    \n",
      "语义 \t  │    \n",
      "科技 \t  ├►ORG\n",
      "公司 \t◄─┘    \n",
      "。  \t       \n"
     ]
    }
   ],
   "source": [
    "HanLP('2021年HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。阿婆主来到北京立方庭参观自然语义科技公司。', tasks='ner/ontonotes').pretty_print()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "XOsWkOqQfzlr"
   },
   "source": [
    "为已分词的句子执行命名实体识别："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 161
    },
    "id": "bLZSTbv_f3OA",
    "outputId": "6a0e1e76-f581-4fd1-8a78-ef97d9429e87"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Token   \tNER Type        \n",
      "────────\t────────────────\n",
      "阿婆主     \t                \n",
      "来到      \t                \n",
      "北京立方庭   \t───►LOCATION    \n",
      "参观      \t                \n",
      "自然语义科技公司\t───►ORGANIZATION\n",
      "。       \t                \n"
     ]
    }
   ],
   "source": [
    "HanLP(tokens=[[\"阿婆主\", \"来到\", \"北京立方庭\", \"参观\", \"自然语义科技公司\", \"。\"]], tasks='ner').pretty_print()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "collapsed_sections": [],
   "name": "ner_restful.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
